Existing analyses of neural network training often operate under the unrealistic assumption of an extremely small learning rate. This lies in stark contrast to practical wisdom and empirical studies, such as the work of J. Cohen et al. (ICLR 2021), which exhibit startling new phenomena (the "edge of stability" or "unstable convergence") and potential benefits for generalization in the large learning rate regime. Despite a flurry of recent works on this topic, however, the latter effect is still poorly understood. In this paper, we take a step towards understanding genuinely non-convex training dynamics with large learning rates by performing a detailed analysis of gradient descent for simplified models of two-layer neural networks. For these models, we provably establish the edge of stability phenomenon and discover a sharp phase transition for the step size below which the neural network fails to learn "threshold-like" neurons (i.e., neurons with a non-zero first-layer bias). This elucidates one possible mechanism by which the edge of stability can in fact lead to better generalization, as threshold neurons are basic building blocks with useful inductive bias for many tasks.
translated by 谷歌翻译
我们为基于分数的生成模型(SGM)(例如Denoising扩散概率模型(DDPM))提供理论收敛保证,该模型构成了大型现实世界中生成模型的骨干,例如DALL $ \ cdot $ E2。我们的主要结果是,假设有准确的分数估计值,此类SGM可以从本质上有效地从任何现实的数据分布中进行采样。与先前的作品相反,我们的结果(1)以$ l^2 $准确的分数估算(而不是$ l^\ infty $ -CACCRATE)保持; (2)不需要限制性的功能不平等条件,而这些条件排除了实质性的非con虫; (3)在所有相关问题参数中刻度缩放; (4)匹配兰格文扩散离散的最新复杂性保证,前提是得分误差足够小。我们认为这是SGM的经验成功的强有力理论理由。我们还基于严重阻尼的Langevin扩散(CLD)检查SGM。与传统的观点相反,我们提供了证据,表明CLD的使用不会降低SGM的复杂性。
translated by 谷歌翻译
经典地,连续时间兰富文队扩散在唯一的假设下迅速迅速迅速迅速迅速,以至于$ \ PI $满足POINCAR的不平等。使用这一事实来为离散时间Langevin Monte Carlo(LMC)算法提供保证,因此由于需要与Chi Squared或R \'enyi分歧的需要,并且在很大程度上主要重点关注日志凹形目标。在这项工作中,我们为LMC提供了第一个收敛保证,假设$ \ PI $满足Lata {\ l} a - oleszkiewicz或修改的log-sobolev不等式,它在Poincar \ e和log-sobolev设置之间插值。与现有作品不同,我们的结果允许弱滑性,并且不需要凸起或耗散条件。
translated by 谷歌翻译
对于任何凸起的身体$ k \ subseteq \ mathbb r ^ n $,s. bubeck和r. eldan在$ k $引入了熵屏障,并显示了它是一个$(1 + o(1))\,n $ - 自助障碍。在本说明中,我们观察到自我协调参数上的$ N $的最佳限制持有尺寸粉彩不平等。
translated by 谷歌翻译